Skip to main content

Run

A collection can be executed multiple times. A Run is a single execution of a collection.

Endpoints

POST /v1/async/collections/{collection_id}/run

This endpoint triggers a Run of a collection.

A collection_id is required to make this request.

Response Example:

{
"run_id": "9b64941a-4545-4c57-9174-c70e781d9192",
"status": "in_progress",
"total_requests": 2,
"success_requests": 0,
"failed_requests": 0,
"timeout_requests": 0,
"collection_id": "9634997b-6431-4b11-a4cb-fc00e941ba8d",
"job_ids": ["job-uuid-1", "job-uuid-2"],
"callback_url": "https://your-server.com/webhook",
"callback_status": "pending"
}

Details about the returned fields can be found in Reference.

GET /v1/async/collections/{collection_id}/runs

Lists every run of a given collection, newest first.

Useful for two things:

  • Audit / dashboards: see all the times a collection has been executed.
  • Recovery after a submit timeout: you persisted the collection_id, your POST /run request lost its response — re-attach to the live run with ?status_filter=in_progress instead of triggering a duplicate.
# All runs
curl 'https://api.scrapingpros.com/v1/async/collections/{collection_id}/runs' \
-H 'Authorization: Bearer <API-KEY>'

# Just the live run
curl 'https://api.scrapingpros.com/v1/async/collections/{collection_id}/runs?status_filter=in_progress' \
-H 'Authorization: Bearer <API-KEY>'

Response Example:

{
"items": [
{
"run_id": "9b64941a-4545-4c57-9174-c70e781d9192",
"status": "in_progress",
"total_requests": 100,
"success_requests": 73,
"failed_requests": 5,
"timeout_requests": 0,
"collection_id": "9634997b-6431-4b11-a4cb-fc00e941ba8d",
"callback_url": null,
"callback_status": null,
"created_at": 1777853217.82
}
],
"total": 1
}

GET /v1/async/collections/{collection_id}/runs/{run_id}

This endpoint returns the current status of a Run, including the webhook delivery status.

Response Example

{
"run_id": "9b64941a-4545-4c57-9174-c70e781d9192",
"status": "completed",
"total_requests": 2,
"success_requests": 2,
"failed_requests": 0,
"timeout_requests": 0,
"collection_id": "9634997b-6431-4b11-a4cb-fc00e941ba8d",
"job_ids": ["job-uuid-1", "job-uuid-2"],
"callback_url": "https://your-server.com/webhook",
"callback_status": "sent"
}

GET /v1/async/collections/{collection_id}/runs/{run_id}/jobs

Lists all jobs of a run with cursor-based pagination. Returns metadata (URL, status, timings, custom_id, validator fields) without the HTML body — use the /result endpoint below to download content.

Query parameters:

ParamTypeDefaultDescription
cursorstring(none)Opaque cursor returned by the previous page. Omit on first call. Encoding depends on order_by — mixing them returns 400.
limitinteger100Page size. Min 1, max 1000.
status_filterstring / CSV(none)Single value or CSV: completed, failed, timeout, processing. Example status_filter=completed,failed,timeout.
since_completed_atISO 8601 string(none)Returns only rows with completed_at strictly greater. Accepts Z, +00:00, or naive (UTC). Rows with NULL completed_at are excluded.
order_byid | completed_atidSort order. Use completed_at for streaming completions as they finish.
order_dirasc | descascHonored only for order_by=completed_at.

Response example:

{
"items": [
{
"job_public_id": "e3a1b2c4-...",
"run_public_id": "9b64941a-...",
"collection_id": "9634997b-...",
"status": "completed",
"url": "https://example.com/tours/123",
"custom_id": "tour_12345",
"url_truncated": false,
"status_code": 200,
"message": null,
"queued_at": "2026-04-23T12:00:00.123",
"started_at": "2026-04-23T12:00:02.267",
"completed_at": "2026-04-23T12:00:03.637",
"execution_time_ms": 1370,
"retries_attempted": 0,
"block_reason": null,
"protection_stack": ["cloudflare"],
"rule_hits": []
}
],
"cursor_next": "MzQ=",
"has_more": true
}

Timing: jobs appear in this listing roughly 5 seconds after completion (internal metadata flusher tick). The ordered sequence of queued_at → started_at → completed_at lets you compute queue wait time and execution latency per job.

Retention: listing metadata is retained for 90 days after the run (MySQL partitioned tables). HTML bodies are retained for 48 hours — beyond that window, the /result endpoint returns 404 but the listing above is still available.

Pagination pattern:

cursor = None
while True:
params = {"limit": 500}
if cursor:
params["cursor"] = cursor
page = requests.get(jobs_url, headers=H, params=params).json()
for job in page["items"]:
handle(job)
if not page["has_more"]:
break
cursor = page["cursor_next"]

See full reference at apiReference/scrapeo_asincronico.

GET /v1/async/collections/{collection_id}/runs/{run_id}/jobs/{job_id}/result

Retrieves the full result of a specific job (HTML body, extracted data, timings). Results are available for 48 hours after job completion.

Response Example

{
"url": "https://example.com/tours/123",
"custom_id": "tour_12345",
"html": "<!doctype html>...",
"statusCode": 200,
"timings": {"queue_wait_ms": 45, "proxy_ms": 120},
"potentiallyBlockedByCaptcha": false,
"extracted_data": null
}

The response includes url and custom_id so you can correlate each result back to your original request without relying on insertion order.

If the result is unavailable, the API responds with 404 and a structured detail that tells you which kind of unavailable it is:

HTTP 404
{
"detail": {
"error_code": "result_lost",
"message": "Job result is unavailable due to a service incident during the completion window. Contact support if the data is critical — it may qualify for refund.",
"completed_at": "2026-04-30T12:34:56Z",
"age_hours": 0.4
}
}
error_codeMeaningSuggested action
result_pendingJob is still in flight, or the worker did not store a result yet.Retry shortly.
result_expiredMore than 24 h since completion — the body has been pruned.Re-run the collection if you still need the data.
result_lostBody unavailable within the 24 h window.Contact support — may qualify for refund.
job_id_invalidWe have no record of this job.Verify the IDs in your client.

Webhooks

If the collection has a callback_url configured, a signed HTTP POST is automatically sent upon run completion:

{
"event": "run.completed",
"run_id": "uuid",
"collection_id": "uuid",
"status": "completed",
"total_requests": 2,
"success_requests": 2,
"failed_requests": 0,
"job_ids": ["job-uuid-1", "job-uuid-2"],
"results_url": "https://api.scrapingpros.com/v1/async/collections/{cid}/runs/{rid}",
"timestamp": "2026-04-06T20:30:00Z"
}

Security: The webhook includes an HMAC-SHA256 signature in the headers:

  • X-SP-Signature: sha256=<hex> -- signature of {timestamp}.{body}
  • X-SP-Timestamp: <unix_epoch>

Retries: If delivery fails (timeout, 5xx), it is automatically retried up to 5 times with backoff: 1min, 5min, 30min, 2h, 12h. The callback_status field reflects the current status.

Reference

  • run_id: Generated UUID of the run. This value is recommended for run tracking using GET /v1/async/collections/{collection_id}/runs/{run_id}.
  • status: The current status of the Run. It can take 2 values: in_progress or completed.
  • total_requests: Number of requests in the collection.
  • success_requests: Number of requests that delivered usable content (HTTP 2xx + no block signal). A job whose worker completed but whose target returned 4xx/5xx or a captcha page is counted under failed_requests, not here.
  • failed_requests: Number of requests that failed.
  • timeout_requests: Number of requests that timed out.
  • collection_id: UUID of the collection.
  • job_ids: List of UUIDs of the individual jobs. Use these to retrieve results with the job result endpoint. Available for the lifetime of the run, regardless of status — you can always enumerate the jobs of a run, even after status=completed and after the result bodies have expired (the listing metadata is kept for 90 days).
  • callback_url: Configured webhook URL (if set).
  • callback_status: Webhook delivery status: pending (in progress), sent (delivered), failed (delivery failed), retrying (retrying delivery).